"Found in Translation": Predicting Outcomes of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models
نویسندگان
چکیده
There is an intuitive analogy of an organic chemist’s understanding of a compound and a language speaker’s understanding of a word. Consequently, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a novel way of tokenization, which is arbitrarily extensible with reaction information. With this approach, we demonstrate results superior to the state-of-the-art solution by a significant margin on the top-1 accuracy. Specifically, our approach achieves an accuracy of 80.3% without relying on auxiliary knowledge such as reaction templates. Also, 65.4% accuracy is reached on a larger and noisier dataset.
منابع مشابه
Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions
Finding the main product of a chemical reaction is one of the important problems of organic chemistry. This paper describes a method of applying a neural machine translation model to the prediction of organic chemical reactions. In order to translate ‘reactants and reagents’ to ‘products’, a gated recurrent unit based sequence–to–sequence model and a parser to generate input tokens for model fr...
متن کاملRhombellanic Crystals and Quasicrystals
Design of some crystal and quasicrystal networks, based on rhombellane tiling,is presented. [1,1,1]Propellane,is a synthesized organic molecule; its hydrogenated form, the bicyclo[1.1.1]pentane,may be represented by the complete bipartite graph K2,3 which is the smallest rhombellane. Topology of translational and radial structures involving rhombellanes is described in terms of vertex symbol, c...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملThe modified recombinant proinsulin: a simple and efficient route to produce insulin glargine in E. coli
Background: Recombinant insulin glargine, a long-acting analogue of insulin, is expressed as proinsulin in host cell and after purification and refolding steps cleaved to active insulin by enzymatic digestion using trypsin and carboxypeptidase B. Since the proinsulin's B and C chains have several internal arginine and lysine residues, a number of impurities are generated following treatment wit...
متن کاملNeural networks for the prediction organic chemistry reactions
Reaction prediction remains one of the great challenges for organic chemistry. Solving this problem computationally requires the programming of a vast amount of knowledge and intuition of the rules of organic chemistry and the development of algorithms for their application. It is desirable to develop algorithms that, like humans, "learn" from being exposed to examples of the application of the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.04810 شماره
صفحات -
تاریخ انتشار 2017